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Abstract. We present a new method for computing core universal solutions in data exchange 
settings specified by source-to-target dependencies, by means of SQL queries. Unlike previously 
known algorithms, which are recursive in nature, our method can be implemented directly on 
top of any DBMS. Our method is based on the new notion of a laconic schema mapping. A 
laconic schema mapping is a schema mapping for which the canonical universal solution is the 
core universal solution. We give a procedure by which every schema mapping specified by FO s-t 
tgds can be turned into a laconic schema mapping specified by FO s-t tgds that may refer to a 
linear order on the domain of the source instance. We show that our results are optimal, in the 
sense that the linear order is necessary and the method cannot be extended to schema mapping 
involving target constraints. 



1 Introduction 

We present a new method for computing core universal solutions in data exchange settings specified by 
source-to-target dependencies, by means of SQL queries. Unlike previously known algorithms, which 
are recursive in nature, our method can be implemented directly on top of any DBMS. Our method is 
based on the new notion of a laconic schema mapping. A laconic schema mapping is a schema mapping 
for which the canonical universal solution is the core universal solution. We give a procedure by which 
every schema mapping specified by FO s-t tgds can be turned into a laconic schema mapping specified 
by FO s-t tgds that may refer to a linear order on the domain of the source instance. 

Outline of the paper: In Section 2, we recall basic notions and facts about schema mappings. Section 3 
explains what it means to compute a target instance by means of SQL queries, and we state our main 
result. Section 4 introduces the notion of laconicity, and contains some initial observations In Section 5, 
we present our main result, namely a method for transforming any schema mapping specified by FO 
s-t tgds into a laconic schema mapping specified by FO s-t tgds asssuming a linear order. In Section 6, 
we show that our results cannot be extended to the case with target constraints. 



2 Preliminaries 



In this section, we recall basic notions from data exchange and fix our notation. 
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work of the fourth author partly funded by NSF CAREER Award IIS-0347065 and NSF grant IIS-0430994. 



2.1 Instances and homomorphisms 



Fix disjoint infinite sets of constant values Cons and null values Nulls , and let < be a linear order on 
Cons . We consider instances whose values are from Cons U Nulls . We use dom(/) to denote the set of 
values that occur in facts in the instance /. A homomorphism h : I — > J, with /, J instances of the 
same schema, is a function h : Cons U Nulls — > Cons U Nulls with h(a) = a for all a G Cons , such that for 
all relations R and all tuples of (constant or null) values (v±, . . . , v n ) G R 1 , (h(vi), . . . , h(v n )) G R J . 
Instances /, J are homomorphically equivalent if there are homomorphisms h : I —> J and b! : J — > I. 
An isomorphism h : J = J is a homomorphism that is a bijection between dom(/) and dom(J) and 
that preserves truth of atomic formulas in both directions. Intuitively, nulls act as placeholders for 
actual (constant) values, and a homomorphism from I to J captures the fact that J "contains more, 
or at least as much information" as /. 

The fact graph of an instance / is the graph whose nodes are the facts Rv (with R a fc-ary relation 
and v G ( Cons U Nulls ) fc , k > 0) true in J, and such that there is an edge between two facts if they 
have a null value in common. 

We will denote by CQ, UCQ, and FO the set of conjunctive queries, unions of conjunctive queries, 
and first-order queries, respectively, and CQ < , UCQ < , and FO< arc defined similarly, except that the 
queries may refer to the linear order. Thus, unless indicated explicitly, it is assumed that queries do 
not refer to the linear order. For any query q and instance /, we denote by q{I) the answers of q in J, 
and we denote by q(I)i the ground answers of q, i.e., q(I)i = q(I) H Cons fc for k the arity of q. 

2.2 Schema mappings, universal solutions, and certain answers 

Let S and T be disjoint schemas, called the source schema and the target schema. As usual in data 
exchange, whenever we speak of a source instance, we will mean an instance of S whose values belong 
to Cons , and when we speak of a target instance, we will mean a instance of T whose values may come 
from Cons U Nulls . 

A schema mapping is a triple M. = (S, T, E st ), where S and T are the source and target schemas 
and £ s t is a finite set of sentences of some logical language defining a class of pairs of instances 
(/, J). Here, {I, J) denotes union of a source instance / and a target instance J, which is itself an 
instance over the joint schema S U T, and the logical languages we consider are presented below. Two 
schema mappings, M. = (S,T, E s t) and M! = (S,T, E' st ), are said to be logically equivalent if E s t 
and E' st are logically equivalent, i.e., satisfied by the same pairs of instances. Given a schema mapping 
M. = (S, T, £ s t) an d a source instance /, a solution for / with respec to M. is a target instance J such 
that (I, J) satisfies E st . We denote the set of solutions for / with respect to M. by Sol^CO, or simply 
Sol(/) when the schema mapping is clear from the context. 

The concrete logical languages that we will consider for the specification of E st are the following. A 
source-to-target tuple generating dependency (s-t tgd) is a first-order sentence of the form \/x((f>(x) — > 
3y.tp(x,y)), where is a conjunction of atomic formulas over S and ip is a conjunction of atomic 
formulas over T, such that each variable in x occurs in </>. A more general class of constraints called 
FO s-t tgds is defined analogously, except that the antecedent is allowed to be any FO query over S. 
Similarly, L s-t tgds can be defined for any query language L. A LAV s-t tgd, finally, is an s-t tgd in 
which <f> is a single atomic formula. To simplify notation, we will typically drop the universal quantifiers 
when writing (L) s-t tgds. 

Given a source instance /, a schema mapping A4, and a target query q, we will denote by 
certain M,q{J) the set of certain answers to q in I with respect to M., i.e., the intersection 
(~^JgSo\ m (i) QiJ)' I n °th er words, a tuple of values is a certain answer to q if belongs to the set of 
answers of q, no matter which solution of / one picks. There arc two methods to compute certain 
answers to a conjunctive query. The first method uses universal solutions and the second uses query 
rewriting. 

A universal solution for a source instance / with respect to a schema mapping M. is a solution 
J G SoI_a,((-0 such that, for every J' G So\m(1\ there is a homomorphism from J to J'. It was shown 



in [1] that the certain answers for a conjunctive target query can be obtained simply by evaluating 
the query in a universal solution. Moreover, universal solutions are guaranteed to exist for schema 
mappings specified by L s-t tgds, for any query language L. 

Theorem 1 ([1]). For all schema mappings M, source instances I, conjunctive queries q, and uni- 
versal solutions J G Sol(7), certain m(q) (-0 = q(J)i- 

Theorem 2 ([1]). For every schema mapping M specified by L s-t tgds, with L any query language, 
and for every source instance I , there is a universal solution for I with respect to M. 

Theorem 2 was shown in [1] for schema mappings specified by s-t tgds but the same argument 
applies to schema mappigns specified by L s-t tgds, for any query language L. We will discuss concrete 
methods for constructing universal solutions in Section 3. 

The second method for computing certain answers to conjunctive queries is by rewriting the given 
target query to a query over the source that directly computes the certain answers to the original 
query. 

Theorem 3. Let L be any o/UCQ, UCQ < , FO, FO < . Then for every schema mapping M specified 
by s-t tgds and for every L-query q over T, one can compute in exponential time an L-query over S 
defining certain M,q- 

There are various ways in which such certain answer queries can be obtained. One possibility is to 
split up the schema mapping M into a composition Mi oM 2 , with Mi specified by full s-t tgds and 
M2 specified by LAV s-t tgds, and then to successively apply the known query rewriting techniques 
of MiniCon [8] and full s-t tgd unfolding (cf. [7]). In [9], an alternative rewriting method was given for 
the case of L = FO (< \ which can be used to compute in polynomial time an FO^' source query q' 
defining certain M,q over source instances whose domain contains at least two elements. 

2.3 Core universal solutions 

A source instance can have many universal solutions. Among these, the core universal solution plays 
a special role. A target instance J is said to be a core if there is no proper subinstance J' C J 
and homomorphism h : J — > J' . There is equivalent definition in terms of retractions. A subinstance 
J 1 C J is called a retract of J if there is a homomorphism h : J — * J' such that for all a G dom(J'), 
h(a) = a. The corresponding homomorphism h is called a retraction. A retract is proper if it is a 
proper subinstance of the original instance. A core of a target instance J is a retract of J that has 
itself no proper retracts. Every (finite) target instance has a unique core, up to isomorphism. Moreover, 
two instances are homomorphically equivalent iff they have isomorphic cores. It follows that, for every 
schema mapping M, every source instance has at most one core universal solution up to isomorphism. 
Indeed, if the schema mapping M is specified by FO s-t tgds then each source instance has exactly 
one core universal solution up to isomorphism [3] . We will therefore freely speak of the core universal 
solution. 

It has been convincingly argued that, among all universal solutions for a source instance, the core 
universal solution is the preferred solution. One important reason is that the core universal solution is 
the smallest universal solution: if J is the core universal solution for a source instance I with respect to 
a schema mapping M, and J' is any other solution universal solution for /, i.e., one that is not a core, 
then J I < I J'\. Consequently, the core universal solution is the universal solution that is least expensive 
to materialize. We add to this a second virtue of the core universal solution, namely that, among all 
universal solutions, it is the most conservative one in terms of the answers that it assigns to conjunctive 
queries with inequalities. We propose another reason to be interested in the core universal solution, 
namely that it is the solution that satisfies the most dependencies. In many practical data exchange 
settings, one is interested in solutions satisfying certain target dependencies. One way to obtain such 
solutions is to include the relevant target dependencies in the specification of the schema mapping. 



If the target dependencies satisfy certain syntactic requirements (in particular, if they form a weakly 
acyclic set of target tgds and target egds) , then a solution satisfying these target dependencies can be 
obtained by means of the chase. On the other hand, sometimes it happens that the universal solution 
one constructs without taking into account the target dependencies happens to satisfy the target 
dependencies. Whether this happens depends very much on which universal solution is constructed. 
For example if M. is the schema mapping specified by the s-t tgd Rx — > By.Syx, I is any source instance 
and J a universal solution, then the first attribute of S is not necessarily a key in J. However, if J is 
the core universal solution, then it will be a key. In fact, it turns out that the core universal solution is 
the universal solution that maximizes the set of valid target dependencies. To make this precise, let a 
disjunctive target dependency be a first-order sentence of the from \/x<f>(x) — > V; Hi))) where 

</>,ipi are conjunctions of atomic formulas over the target schema T and/or equalities. Then we have: 

Theorem 4. Let M. be any schema mapping, I be any source instance, J the core universal solution 
of I, and J' any other universal solution of I, i.e., one that is not a core. Then 

1. Every disjunctive dependency valid on J' is valid on J, and 

2. Some disjunctive dependency valid on J is not valid on J' . 

Proof. The first half of the result follows from the fact that J is a retract of J' and disjunctive 
dependencies are preserved when going from an instance to one of its retract. This is shown in [5] for 
non-disjunctive embedded dependencies, but the same argument applies to disjunctive dependencies. 
To prove the second half, pick fresh variables x, one for each value (constant or null) in the domain 
of J', and let ip{x) be the conjunction of all facts that are true in J' under the natural assignment. 
Consider the disjunctive dependency \/x(tp(x) — > \J ^(xi = Xj))- This disjunctive dependency is 
clearly not true in J' but it is trivially true in J, since J, being a proper retract of J', contains strictly 
fewer nulls than J'. □ 

Concerning the complexity of computing core universal solutions, we have the following: 

Theorem 5 ([3]). For fixed schema mappings specified by FO < s-t tgds, given a source instance, a 
core universal solution can be computed in polynomial time. 

Strictly speaking, in [3] this was only shown for schema mappings specified by s-t tgds. However the 
same argument applies to FO < s-t tgds. In fact, this holds for richer data exchange settings, there the 
schema mapping specification may contain also target constraints (specifically, target egds and weakly 
acyclic target tgds). Moreover, several algorithms for obtaining core universal solutions in polynomial 
time have been proposed. 

3 Computing universal solutions with SQL queries 

There is a discrepancy between the methods for computing universal solutions commonly presented 
in the data exchange literature, and the methods actually employed by data exchange tools. In the 
data exchange literature, methods for computing universal solutions are often presented in the form 
of a chase procedures. In practical implementations such as Clio, on the other hand, it is common to 
compute universal solutions using SQL queries, thus leveraging the capabilities of existing DBMSs. We 
briefly review here both approaches, and explain how canonical universal solutions can be computed 
using SQL queries. 

The simplest and most well known method for computing universal solutions is the naive chase 5 
The algorithm is described in Figure 1. For a source instance / and schema mapping M specified by 
FO^- 1 s-t tgds, the result of applying the naive chase is called the canonical universal solution of I with 
respect to M. Note that the result of the naive chase is unique up to isomorphism, since it depends 
only on the exact choice of fresh nulls. Also note that, even if two schema mappings are logically 
equivalent, they may assign different canonical universal solutions to a source 1 instance. We will now 

5 There are also other, more sophisticated versions of the chase, but they will not be relevant for most of what 
we discuss, since we will be interested in computing solutions by means of SQL queries anyway. We will 
briefly mention one variant of the chase later on. 



Input: A schema mapping M. — (S, T, E a t) and a source instance / 
Output: A target instance J that is a universal solution for I w.r.t. M 

J:= 0; 

for all \/x(cj>(x) — > 3y.ip(x,y)) £ S„t do 

for all tuples of constants a such that / |= 4>(a) do 

pick a fresh null value Ni for each t/i and add the facts in ip(a, N) to J 

end for 
end for; 
return J 

Fig. 1. Naive chase method for computing universal solutions. 

discuss how canonical universal solutions can be equivalently computed by means of SQL queries. The 
idea is very simple, and before giving a rigorous presentation, we illustrate it by an example. Consider 
the schema mapping specified by the s-t tgds 

Rx\x 2 — > 3y.(Sxiy A Tx 2 y) 
Rxx — > Sxx 

We first Skolemize the dependencies, and split them so that the right hand side consists of a single 
conjunct. In this way, we get 

Rx\x 2 — > Sxif(xi,x 2 ) 
Rxix 2 — > Tx 2 f(xi, x 2 ) 
Rxx — > Sxx 

Next, for each target relation R we collect the dependencies that contain R in the right hand side, and 
we interpret these as constituting a definition of R. In this way, we get the following definitions of S 
and T: 

S := {(xi, f(xi,x 2 )) | Rxix 2 } U {(x,x) \ Rxx} 
T := {{x 2 , f{x 1 ,x 2 )) | Rxix 2 } 

In general, the definition of a fc-ary target relation R G T will be of the shape 

R := {(*!(*),...,**(*)) |#r)} U ■■• U {(t'^x'), . . . , t' k {x)) \ </>'(*')} W 

where ti, . . . , tk, ■ ■ ■ , t^, . . . , t' k arc terms and 0, . . . , <f>' are first-order queries over the source schema. 
Since FO queries correspond to SQL queries, one can easily use a relational DBMS in order to compute 
the tuples in the relation R. 

The general idea behind the construction of the FO queries should be clear from the example. 
However, giving a precise definition of what it means to compute a target instance by means of 
SQL queries require a bit of care. We need to assume some structure on the set of nulls Nulls . Fix 
a countably infinite set of function symbols of arity n, for each n > 0. For any set X, denote by 
Terms [AT] be the set of all terms built up from elements of X using these function symbols, and denote 
by PTerms [X] C Terms LY] the set of all proper terms, i.e., those with at least one occurrence of a 
function symbol. For instance, if g is a unary function and h is a binary function, then h{g{x),y), 
g(x) and x belong to Terms[{a;, y}], but only the first two belong to PTerms fjz, y}]. It is important 
to distinguish between proper terms built up from constants on the one hand and constants on the 
other hand, as the former will be treated as nulls and the latter not. More precisely, we assume that 
PTerms [ Cons ] C Nulls . Recall that Cons n Nulls = 0. 

Definition 1 (L-term interpretation). Let L be any query language. An i-term interpretation LT 
is a map assigning to each k-ary relation symbol R € T a union of expressions of the form (1) where 
ti, . . . , tk G Terms [a;] and 4>{x) is an L-query over S. 



Given a source instance 7, an 7-term interpretation 77 generates an target instance 77 (7), in the 
obvious way. Note that 77(7) may contain constants as well as nulls. Although the program specifies 
exactly which nulls are generated, we will consider 77(7) only up to isomorphism, and hence the 
meaning of an 7-term interpretation does not depend on exactly which function symbols it uses. 

The previous example shows 

Proposition 1. Let L be any query language. For every schema mapping specified by L s-t tgds there 
is an L-term interpretation that yields for each source instance the canonical universal solution. 

Incidentally, even for schema mappings specified by SO tgds, as defined in [4], FO-term interpre- 
tations can be constructed that compute the canonical universal solution. However, the above suffices 
for present purposes. 

On the other hand, 

Proposition 2. No FO-term interpretation yields for each source instance the core universal solution 
with respect to the schema mapping specified by the FO (in fact LAV) s-t tgd Rxy — ► 3z.(Sxz A Syz). 

Proof. The argument uses the fact that FO formulas are invariant for automorphisms. Let 7 be the 
source instance whose domain consists of the constants a, b, c, d, and such that 7? is the total relation 
over this domain. Note that every permutation of the domain is an automorphism of 7. Suppose for 
the sake of contradiction that there is an FO-term interpretation 77 such that the 77(7) is the core 
universal solution of 7. Then the domain of 77(7) consists of the constants a, b, c, d and a distinct null 
term, call it Ni^y-i £ PTerms [a;], for each pair of distinct constants x, y £ {a, b, c, d}, and 77(7) contains 
the facts RxN{ XiV x and RyNs x>y \ for each of these nulls N{ xy \. Now consider the term N{ a ^\. We 
can distinguish two cases. The first case is where the term N{ ab \ does not contain any constants as 
arguments. In this case, it follows from the invariance of FO formulas for automorphisms that 77(7) 
contains RxNi a ^\ for every x £ {a, b, c, d}, which is clearly not true. The second case is where N^ a b \ 
contains at least one constant as an argument. If Ni a ^\ contains the constant a or b then let t' be 
obtained by switching all occurrences of a and b in N{ ab \, otherwise let t' be obtained by switching 
all occurrences of c and d in iV/ a ^i. Either way, we obtain that there is a second null, namely t', which 
is distinct from N^ ab \, and which stands in exactly the same relations to a and b as N{ ab y docs. This 
again contradicts our assumption that J is the core universal solution of 7. 

Things change in the presence of a linear order. We will show that every schema mapping specified 
by FO < s-t tgds is logically equivalent to a laconic schema mapping specified by FO < s-t tgds, i.e., 
one for which the canonical universal solution is always a core. In particular, given Proposition 1, this 
shows: 

Theorem 6. For every schema mapping specified by FO < s-t tgds there is a FO < -term interpretation 
that yields for each source instance the core universal solution. 

In the case of the example from Proposition 2, the FO < -term interpretation 77 computing the core 
universal solution is given by 

77(5) = {(xi, f(xx,X2)) | (Rxix 2 V Rx^xx) A x\ < X2} 
U {(x2, f(xi,x 2 )) I (Rxix 2 V Rx 2 xi) Aii < x 2 } 

Furthermore, we will show that every schema mapping defined by FO s-t tgds whose right-hand-side 
contains at most one atomic formula is equivalent to a laconic schema mapping specified by FO s-t tgds, 
and therefore, its core universal solutions can be computed by means of an FO-term interpretation. In 
other words, in this case the linear order is not needed. Note that in the example from Proposition 2, 
the right-hand-size of the s-t tgd consists of two atomic formulas. 

In the next section, we formally introduce the notion of laconicity. In Section 5, we show that every 
schema mapping specified by FO < s-t tgds is logically equivalent to a laconic schema mapping specified 
by FO< s-t tgds. 



Non-laconic schema mapping Logically equivalent laconic schema mapping 



(a) Px — > 3yz.Rxy A Rxz (a') 

(b) Px -> 3y.Rxy (b') 
Px — > 

(c) — > Ssy (c') 
Pa; — > 3y.Sxy 

(d) toy -> 3«.Sa:j/z (d') 
Rxx — ^ Sxxx 

(e) Rxy ->3z.(Sxz ASyz) (e') 
Fig. 2. Examples of non-laconic schema mappings and their laconic equivalents. 

4 Laconicity 

A schema mapping is laconic if the canonical universal solution of a source instance coincides with the 
core universal solution. In particular, for laconic schema mappings the core universal solution can be 
computed using any method for computing canonical universal solutions, such as the ones described 
in Section 3. In this section, we discuss some examples and general observations concerning laconicity, 
in order to make the reader familiar with the notion. In the next section we will focus on constructing 
laconic schema mappings. In particular, we will show there that every schema mapping specified by 
FO < s-t tgds is logically equivalent to a laconic schema mapping specified by FO < s-t tgds. 

Definition 2 (Laconicity). A schema mapping is laconic if for every source instance I, the canonical 
universal solution of I with respect to M. is a core. 

Note that the definition only makes sense for schema mappings specified by FO^' s-t tgds, because 
we have defined the notion of a canonical universal solution only for such schema mappings. 

Examples of laconic and non-laconic schema mappings are given in Figure 2. It is easy to see that 
every schema mapping specified by full s-t tgds only (i.e., s-t tgds without existential quantifiers) is 
laconic. Indeed, in this case, the canonical universal solution docs not contain any nulls, and hence is 
guaranteed to be a core. Thus, being specified by full s-t tgds is a sufficient condition for laconicity, 
although a rather uninteresting one. The following provides us with a necessary condition, which 
explains why the schema mapping in Figure 2(a) is not laconic. Given an s-t tgd \/x((f> — > 3y.i/j), by 
the canonical instance of ip, we will mean the target instance whose facts are the conjuncts of ip, where 
the x variables are treated as constants and the y variables as nulls. 

Proposition 3. // a schema mapping (S,T, S s t) specified by s-t tgds is laconic, then for each s-t tgd 
\fx(<p — > 3y.ip) G S st , the canonical instance of ip is a core. 

Proof. We argue by contraposition. Suppose the canonical instance J of ip is not a core. Let J' C J 
be the core of J and h : J — > J' the corresponding retraction. 

Take any source instance / in which <p> is satisfied under an assignment g, and let K be the canonical 
universal solution of /. Since </> is true in / under the assignment g and by the construction of the 
canonical universal solution, we have that g extends to a homomorphism g : J — > K sending the y 
values to disjoint nulls. In fact, we may assume without loss of generality that g{yi) = yi for each 
j/j G y. Moreover, by the construction of canonical universal solutions these null values will not play 
any further role in subsequent steps of the chase. In particular, they do not participate in any facts of 
K other than those in the ^-image of J. By the p-image of J we mean the subinstance of K containing 
those facts that are in the image of the homomorphism g : J — > K. 



Px — > 3y.Rxy 
Px — > Rxx 

Rxy — > Sxy 

Px A -^sy.Rxy — > 3y.Sxy 

Rxy Ai/i/-» 3z.Sxyz 
Rx x ^ S xxx 

(Rxy V Ryx) A x < y — » 3z.(Sxz A Syz) 



Finally, let K 1 be the subinstance of K in which the (/-image of J is replaced by the g-image of J'. 
Then h : J — > J' naturally extends to a homomorphism h' : K — > if'. Since if' is a proper subinstance 
of X, we conclude that K is not a core, and therefore, M. is not laconic. □ 

In the case of schema mapping (e) in Figure 2, the linear order is used in order to obtain a logically 
equivalent laconic schema mapping (e'). Note that the schema mapping (e') is order-invariant in the 
sense that the set of solutions of a source instance I does not depend on the interpretation of the < 
relation in /, as long as it is a linear order. Still, the use of the linear order cannot be avoided, as 
follows from Proposition 2. What is really going on, in this example, is that the right hand side of (e) 
has a non-trivial automorphism (viz. the map sending x to y and vice versa), and the conjunct x < y 
in the antecedent of (e') plays, intuitively, the role of a tic-breaker, cf. Section 5.3. 

Testing whether a given schema mapping is laconic is not a tractable problem: 

Proposition 4. Testing laconicity of schema mappings specified by FO s-t tgds is undecidable. It is 
NP-hard already for schema mappings specified by LAV s-t tgds. 

Proof. The first claim is proved by a reduction from the satisfiability problem for first-order logic on 
finite instances, which is undecidable by Trakhtenbrot's theorem. For any first-order formula <fi(x), let 
be the schema mapping containing only one dependency, namely \/x((p(x) — > 3j/i?/2-(Pj/i A Py2))- 
It is easy to see that M.^ is laconic iff <j> is not satisfiable. 

The NP-hardncss in the case of LAV mappings is proved by a reduction from the core testing 
problem (given a graph, is it a core), which is known to be NP-complctc [6]. Consider any graph 
G = (V, E) and let 3y.<f>(y) be the Boolean canonical conjunctive query of G. Let Mq be the schema 
mapping whose only dependency is Wx.(Px — > 3y.(<p(y) A ^Rxyi). Then Mg is laconic iff G is a 
core. □ 



5 Making schema mappings laconic 

In this section, we present a procedure for transforming any schema mapping M. specified by FO < s-t 
tgds into a logically equivalent laconic schema mapping M 1 specified by FO < s-t tgds. To simplify the 
notation, throughout this section, we assume a fixed input schema mapping M. = (S, T, S s t), with S st 
a finite set of FO < s-t tgds. Moreover, we will assume that the FO < s-t tgds \/x(<f> — ► 3y.ip) € S st are 
non-decomposable [3], meaning that the fact graph of 3y.<j)(x, y) (where the facts are the conjuncts of 
4> and two facts are connected if they have an existentially quantified variable in common) is connected. 
This assumption is harmless: every FO < s-t tgd can be decomposed into a logically equivalent finite set 
of non-decomposable FO < s-t tgds (with identical left-hand-sides, one for each connected component 
of the fact graph) in polynomial time. 

The outline of the procedure for making schema mappings laconic is as follows (the items correspond 
to subsections of the present section): 

1. Construct a finite list "fact block types": descriptions of potential fact blocks in core universal 
solutions. 

2. Compute for each of the fact block types a precondition: a first-order formula over the source 
schema that tells exactly when the core universal solution will contain a fact block of the given 
type. 

3. If any of the fact block types has non-trivial automorphisms, add an additional side condition, 
consisting of a Boolean combination of formulas of the form xi < Xj, in order to avoid that 
multiple copies of the same fact block are created in the canonical universal solution. 

4. Construct the new schema mapping Ai' = (S, T, S' st ), where S' st contains an FO < s-t tgd for each 
of the fact block types. The left-hand-side of the FO < s-t tgd is the conjunction of the precondition 
and side condition of the respective fact block type, while the right-hand-side is the fact block type 
itself. 



We illustrate the approach by means of an example. The technical notions that we use in discussing 
the example will be formally defined in the next subsections. 

Example 1. Consider the schema mapping Ai = ({P,Q},{Ri,R2},E st ), where S s t consists of the 
dependencies 

Px — » 3y.R\xy 

Qx — > 3yzu.(R2Xy A R2zy A R\zu) 

In this case, there are exactly three relevant fact block types. They are listed below, together with 
their preconditions. 

Fact block type Precondition 

h(x;y) = {Rixy} pre tl (x) = Px 

t2{x;yzu) = {R2xy, i?2^y, R\zu\ prefix) = Qx A -<Px 
t 3 (x;y) = {R 2 xy} pre H {x) = Qx A Px 

We use the notation t(x; y) for a fact block type to indicate that the variables x stand for constants 
and the variables y stand for distinct nulls. 

As it happens, the above fact block types have no non-trivial automorphisms. Hence, no side 
conditions need to be added, and S' st will consist of the following FO s-t tgds: 

Px — > By.Rixy 

Qx A -^Px — > 3yzu.(R2xy A R2zy A R\zu) 

Qx A Px -> 3y.(R 2 xy) 

The reader may verify that in this case, the obtained schema mapping is indeed laconic. We will prove 
in Section 5.4 that the output of our transformation is guaranteed to be a laconic schema mapping 
that is logically equivalent to the input schema mapping. H 

We will now proceed to define all the notions appearing in this example. 
5.1 Generating the fact block types 

Recall that the fact graph of an instance is the graph whose nodes are the facts of the instance, and 
such that there is an edge between two facts if they have a null value in common. A fact block, or f-block 
for short, of an instance is a connected component of the fact graph of the instance. We know from 
[2] that, for any schema mapping Ai specified by FO < s-t tgds, the size of f-blocks in core universal 
solutions for Ai is bounded. Consequently, there is a finite number of f-block types, such that every 
core universal solution consists of f-blocks of these types. This is a crucial observation that we will 
exploit in our construction. 

Formally, an f-block type t(x; y) will be a finite set of atomic formulas in x, y, where x and y are 
disjoint sets of variables. We will refer to x as the constant variables of t and y as the null variables. 
We say that an f-block type t(x;y) is a renaming of an f-block type t'(x';y') if there is a bijection 
/ between x and x 1 and between y and y' , such that t' = {R(f(v)) | R(v) G t}. In this case, we 
write f : t = t' and we call / also a renaming. We will not distinguish between f-block types that are 
renamings of each other. We say that an f-block B has type t(x; y) if B can be obtained from t(x; y) 
by replacing constant variables by constants and null variables to distinct nulls, i.e., if B = t(a,N) 
for some sequence of constants a and sequence of distinct nulls N. Note that we require the relevant 
substitution to be injective on the null variables but not necessarily on the constant variables. If a 
target instance J contains a block B = t(a, N) of type t(x; y) then we say that t(x; y) is realized in 
J at a. Note that, in general, an f-block type may be realized more than once at a tuple of constants 
a, but this will not happen if the target instance J is a core universal solution. 

We are interested in the f-block types that may be realized in core universal solutions. Eventually, 
the schema mapping Ai' that we will construct from Ai will contain an FO < s-t tgd for each relevant 



f-block type. Not every f-block type as defined above can be realized. We may restrict attention to 
a subclass. Below, by the canonical instance of an f-block type t(x; y) we will mean the instance 
containing the facts in t(x; y), considering x as constants and y as nulls. 

Definition 3. The set TypeStvi of f-block types generated by M. consists of all f-block types t(x;y) 
satisfying the following conditions: 

(a) S s t contains an FO < s-t tgd Vx'((j)(x') — > 3y'.tp(x', y')) with y C y' , and t(x,y) is the set of 
conjuncts of %j) in which the variables y' — y do not occur; 

(b) The canonical instance oft{x,y) is a core; 

(c) The fact graph of the canonical instance oft(x;y) is connected. 

If some f-block types generated by M. are renamings of each other, we add only one of them to TYPES m- 

The main result of this subsection is: 

Proposition 5. Let J be a core universal solution of a source instance I with respect to M.. Then 
each f-block of J has type t(x;y) for some t(x;y) E Types^. 

Proof. Let B be any f-block of J. Since J is a core universal solution, it is, up to isomorphism, an 
induced subinstance of the canonical universal solution J' of /. It follows that J' must have an f-block 
B' such that B is the restriction of B' to domain of J. Since B' is a connect component of the fact 
graph of J', it must have been created in a single step during the naive chase. In other words, there is 
an FO< s-t tgd 

\fx((f>(x) 3y.tp(x,y)) 

and an assignment g of constants to the variables x and distinct nulls to the variables y such that B' 
is contained in the set of conjuncts of ip(g(x), g(y))- Moreover, since we assume the FO < s-t tgds of 
M. to be non-decomposable and B' is a a connected component of the fact graph of J, B' must be 
exactly the set of facts listed in ip(g(x), g(y)). In other words, if we let t(x;y) be the set of all facts 
listed in ip, then B' has type t(x; y). Finally, let t'(x'; y') C t(x; y) be the set of all facts from t(x; y) 
containing only variables ?/, for which g(yi) occurs in B. Since B is the restriction of B' to the domain 
of J, we have that B is of type t'(x',y'). Moreover, the fact graph of the canonical instance of J is 
connected because B is connected, and the canonical instance of t'(x'\ y') is a core, because, if it would 
not be, then B would not be a core either, and hence J would not be a core cither, which would lead 
to a contradiction. It follows that t'(x';y') <E TypeStvi- □ 

Note that Types.m contains only finitely many f-block types. Still, the number is in general expo- 
nential in the size of the schema mapping, as the following example shows. 

Example 2. Consider the schema mapping specified by the following s-t tgds: 

PiX — > P[x (for each 1 < i < k) 

Qx -> 3y Q y 1 . ..y k (Rxy Q A Ai<i<k( R y*yo A P lVi)) 

For each S C {1, . . . , k}, the f-block type 

ts(x; (yi)iesu{0}) = {Rxyo} U {Ry l y ,Ply l i G S} 

belongs to Types Indeed, each of these 2 k f-block types is realized in the core universal solution 
of some source instance. The example can be modified to use a fixed schemas: replace P[x by Sxx\ A 
Sx\X2 A . . . Sxi-iXi A SxiXi. H 

The same example can be used to show that the smallest logically equivalent schema mapping that 
is laconic can be exponentially longer. 



5.2 Computing the precondition of an f-block type 



Recall that, to simplify notation, wc assume a fixed schema mapping A4 specified by F0 < s-t tgds. 
The main result of this subsection is the following, which shows that whether an f-block type is realized 
in the core universal solution at a given sequence of constants a is something that can be tested by a 
first-order query on the source. 

Proposition 6. For each t(x;y) £ Types^ there is a FO < query precon t (x) such that for every 
source instance I with core universal solution J, and for every tuple of constants a, the following are 
equivalent: 

1. a £ precon t (I) 

2. t(x; y) is realized in J at a. 

Proof. We first define an intermediate formula precon' t (x) that almost satisfies the required properties, 
but not quite yet. For each f-block type t(x; y), let precon' t (x) be the following formula: 



where y-i stands for the sequence y with j/j removed, and t[u/v] is the result of replacing each 
occurrence of u by v in t. By construction, if precon t (a) holds in /, then every universal solution J 
satisfies t(a, N) for some some sequence of distinct nulls TV. Still, it may not be the case that t(x; y) 
is realized at a, since it may be that that t(a, N) is part of a bigger f-block. To make things more 
precise, we introduce the notion of an embedding. For any two f-block types, t(x; y) and t'(x'; y'), an 
embedding of the first into the second is a function h mapping x into x' and mapping y injectively 
into y' , such that whenever t contains an atomic formula R(z), then R(h(z)) belongs to of t'. The 
embedding h is strict if t' contains an atomic formula that is not of the form R(h(z)) for any R(z) £ t. 
Intuitively, the existence of a strict embedding means that t' describes an f-block that properly contains 
the f-block described by t. 

Let / be any source instance, J any core universal solution, t(x; y) £ TypeSx , and a a sequence 
of constants. 

Claim 1: If t is realized in J at a, then a £ precon' t (I) . 

Proof of claim: Clearly, since t is realized in J at a and J is a universal solution, the first conjunct of 
precon' t is satisfied. That the rest of the query is satisfied is also easily seen: otherwise J would not be 
a core. End of proof of claim. 

Claim 2: If a £ precon' t (I), then cither t is realized in J at a or some f-block type t'(x'; y') £ TypeSx 
is realized at a tuple of constants a', and there is a strict embedding h : t — > t' such that a, = a'j 
whenever h{xi) = x'j. 

Proof of claim: It follows from the construction of precon' t , and the definition of Types types, that 
the witnessing assignment for its truth must send all existential variables to distinct nulls, which belong 
to the same block. By Proposition 5, the diagram of this block is a specialization of an f-block type 
t' £ TypeStvi- It follows that t is embedded in t' and a, together with possible some additional values 
in Cons , realize t'. End of proof of claim. 

We now define precon t (x) to be the following formula: 



certain M {3y. I\t)(x) A f\^certain M (3y^. f\t[yi/yj})(x) 
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t'(x'-y') e Types .m « 
h : t(x;y) — > t (x ; y ) a strict embedding 



This formula satisfies the required conditions: a £ precont(I) iff t(x;y) is realized in J at a. 
The left-to-right direction follows from Claim 2, while the right-to- left direction follows from Claim 1 
together with the fact that J is a core. □ 



5.3 Computing the side conditions of an f-block type 

The issue we address in this subsection, namely that of non-rigid f-block types, is best explained by 
an example. 

Example 3. Consider again schema mapping (e) in Figure 2. This schema mapping is not laconic, 
because, when a source instance contains Rab and Rba, for distinct values a, b, the canonical universal 
solutions will contain two null values TV, each satisfying SaN and SbN, corresponding to the two 
assignments {x <—> a,y t—> b} and {x i— ► b, y i— ► a}. The essence of the problem is in the fact that the 
right-hand-side of the dependency is, in some sense, symmetric: it is a non-trivial renaming of itself, 
the renaming in question being {x ^ y,y ^ x}. According to the terminology that we will introduce 
below, the right-hand-side of this dependency is non-rigid. Schema mapping (e') from Figure 2 docs 
not suffer from this problem, because it contains x < y in the antecedent, and we are assuming < to 
be a linear order on the values in the source instance. H 

In order to formalize the intuition exhibited in the above example, we need to introduce some 
terminology. We say that two f-blocks, B,B', are copies of each other, if there is a bijection / from 
Cons to Cons and from Nulls to Nulls such that f(a) = a for all a £ Cons and B 1 = {R(f(v\), . . . , f{vk)) 
R(v\, . . . , Vk) £ B}. In other words, B' can be obtained from B by renaming null values. 

Definition 4. An f-block type t(x; y) is rigid if for any two sequences of constants a, a' and for any 
two sequences of distinct nulls TV, TV', if t(a; TV) and t(a'; TV') are copies of each other, then a = a! . 

The s-t tgd from the above example is easily seen to be non-rigid. Moreover, a simple variation of 
the argument in the above example shows: 

Proposition 7. If an f-block type t{x; y) is non-rigid, then the schema mapping specified by the FO 
(in fact LAV) s-t tgd\/x(R(x) — > By. f\ t(x; y)) is not laconic. 

In other words, if an f-block type is non-rigid, one cannot simply use it as the right-hand-side of 
an s-t tgd without running the risk of non-laconicity. Fortunately, it turns out that f-block types can 
be made rigid by the addition of suitable side conditions. By a side condition <P(x) we will mean a 
Boolean combination of formulas of the form Xi < Xj or x\ = Xj. 

Definition 5. An f-block type t(x; y) is rigid relative to a side condition <P(x) if for any two sequences 
of constants a, a' satisfying <P(a) and <P(a') and for any two sequences of distinct nulls TV, TV', if 
t{a; N) and t(a'; TV') are copies of each other, then a = a' . 

Definition 6. A side- condition <P(x) is safe for an f-block type t(x;y) if for every f-block t(a,N) of 
type t there is a f-block t{a! , TV') of type t satisfying <f>{a') such that the two are copies of each other. 

Intuitively, safety means that the side condition is not too strong: whenever a f-block type should 
be realized in a core universal solution, there will be at least one way of arranging the variables so that 
the side condition is satisfied. The main result of this subsection, which will be put to use in the next 
subsection, is the following: 

Proposition 8. For every f-block type t{x;y) there is a side condition sidecon t (x) such that t{x;y) 
is rigid relative to sidecon t (x) , and sidecont{x) is safe for t{x; y). 



Proof. We will construct a sequence of side conditions 'Po(x), . . . , <P n (x) safe for t(x;y), such that 
each logically strictly implies and such that t(x;y) is rigid relative to <P n {x). Note that n is 
necessarily bounded by a single exponential function in \x\. For <Po(x) we pick the tautology T, which 
is trivially safe for t(x; y). 

Suppose that t(x;y) is not rigid relative to 'Pi(x), for some i > 0. By definition, this means that 
there are two sequences of constants a, a 1 satisfying #j(a) and ^i(a') and two sequences of distinct 
nulls TV, TV', such that t(a; TV) and t(a'; TV"') are copies of each other, but a and a' are not the same 
sequence, i.e., they differ in some coordinate. Let ip(x) be the conjunction of all formulas of the form 
Xi < Xj or Xi = Xj that are true under the assignment sending x to a, and let $i+i(x) = <Pi(x) A^ip( x )- 
It is clear that ^>,+i is strictly stronger than <Pi. Moreover, we <Pi+i is still safe for t(x;y): consider 
any f-block tip, M) of type t(x; y). Since is safe for t, we can find a f-block t(b', M') of type t such 
that <Pi(b') the two blocks are copies of each other. If — holds, then in fact <?j+i(fo') holds, and 
we are done. Otherwise, we have that t(b', M 1 ) is isomorphic to t(a, TV) and the preimage of t(a', TV') 
under this isomorphism will be again a copy of t(b' , M') (and therefore also of tip, M)) that satisfies 



Incidentally, we believe the above construction of side-conditions is not the most efficient possible, 
in terms of the size of the side-condition obtained. It can probably be improved. 

5.4 Putting things together: constructing the laconic schema mapping 

Theorem 7. For each schema mapping Ai specified by FO < s-t tgds, there is laconic schema mapping 
M! specified by FO < s-t tgds that is logically equivalent to Ai. 

Proof. We define Ai 1 to consist of the following FO < s-t tgds. For each t(x;y) 6 Types jvi, we take 



In order to show that M' is laconic and logically equivalent to M. (on structures where < denotes a 
linear order), it is enough to show that, for every source instance /, the canonical universal solution 
J of I with respect to M' is a core universal solution for I with respect to Ad. This follows from the 
following three facts: 

1. Every f-block of J is a copy of an f-block of the core universal solution of /. This follows from 
Proposition 6. 

2. Every f-block of the core universal solution of I is a copy of an f-block of J. This follows from 
Proposition 5 and Proposition 6, together with the safety part of Proposition 8. 

3. No two distinct f-blocks of J are copies of each other. This follows from the rigidity part of 
Proposition 8 together with the fact that Types.m contains no two distinct f-block type that are 
renamings of each other. □ 

Incidentally, if the side conditions are left out, then the resulting schema mapping is still logically 
equivalent to the original mapping M. but it may not be laconic. It will still satisfy a weak form of 
laconicity: a variant of the chase defined in [1], which only fires dependencies whose right hand side is 
not yet satisfied, will produce the core universal solution. 



6 Target constraints 



&i(b') A -1-0(6'), i.e., $ i+ i(b'). 



□ 



the FO< s-t tgd 




In this section we consider schema mappings with target constraints and we address the question 
whether our main result can be extended to this setting. The answer will be negative. However, first 
we need to revisit our basic notions, as some subtle issues arise in the case with target dependencies. 



It is clear that we cannot expect to compute core universal solutions for schema mappings with 
target dependencies by means of FO < -tcrm interpretations. Even for the simple schema mapping 
defined by the s-t tgd Rxy — > R'xy and the full target tgd R'xy A R'yz — > R'xz computing the core 
universal solution means computing the transitive closure of R, which we know cannot be done in FO 
logic even on finite ordered structures. Still, we can define a notion of laconicity for schema mappings 
with target dependencies. Let A4 be any schema mapping specified by a finite set of FO < s-t tgds E st 
and a finite set of target tgds and target cgds S t , and let / be a source instance. We define the canonical 
universal solution of I with respect to M. as the target instance (if it exists) obtained by taking the 
canonical universal solution of I with respect to S s t and chasing it with the target dependencies Ef 
We assume a standard chase but will not make any assumptions on the chase order. Laconicity is 
now defined as before: a schema mapping is laconic if for each source instance, the canonical universal 
solution coincides with the core universal solution. 

Recall that, according our main result, we have (i) every schema mapping M. specified by FO < s-t 
tgds is logically equivalent to a laconic schema mapping M! specified by FO < s-t tgds. In particular, 
this implies that, (ii) for each source instance J, the core universal solution for / with respect to M. 
is the canonical universal solution for / with respect to M! . For the implication from (i) to (ii) the 
requirement of logical equivalence turns out to be stronger than needed: it is enough that M. and 
M! are CQ- equivalent, i.e., have the same core universal solution (possibly undefined) for each source 
instance [2]. While CQ-cquivalence and logical equivalence coincide for schema mappings specified by 
FO < s-t tgds (as follows from the closure under target homomorphisms), the first is strictly weaker 
than the second in the case with target dependencies [2] . 

Theorem 8. There is a schema mapping M. specified by finitely many LAV s-t tgds and full target 
tgds, for which there is no CQ- equivalent laconic schema mapping M! specified of FO < tgds, target 
tgds and target egds. 

Proof, (sketch) Let M. be the schema mapping specified by the LAV s-t tgds 

- Rx\X 2 — ► R'X\X 2 

- P t x -> By.Qiy foriG {1,2, 3}. 
and the full target tgds 

- R'xy A R'yz -> R'xz 

- R'xx A P\y — > P 3 y 

- R'xx A P 2 y P 3 y 

For source instances I in which the relations R, Pi , P 2 , P3 are non-empty, the core universal solution J 
will have the following shape: J(R') is the transitive closure of I{R), and J(Qi), J{Q 2 ), J{Q 3 ) are non- 
empty. Moreover, if I{R) contains a cycle, then J{Q\) = {Ni}, , J(Q 2 ) = {N 2 } and J{Qy,) = {N\,N 2 } 
for distinct null values N\,N 2 , while if I(R) is acyclic, J(Qi), J(Q 2 ) and J{Q 3 ) are disjoint singleton 
sets of nulls. 

Suppose for the sake of contradiction that there is a CQ-equivalent laconic schema mapping M! 
specified by a finite set of FO < s-t tgds S st and a finite set of target tgds and egds S t . In particular, 
for each source instance /, the canonical universal solution of I with respect to M.' is the core universal 
solution of / with respect to M.. Let n be the maximum quantifier rank of the formulas in S st . 

Claim 1: There is a source instance I\ containing a cycle, such that the canonical universal solution 
J\ of I\ with respect to S s t contains at least three nulls, one belonging only to Q±, one belonging only 
to Q 2 , and one belonging only to Q3. 

The proof of Claim 1 is based on the fact that acyclicity is not first-order definable on finite ordered 
structures: take any two sources instances Ii,I 2 agreeing on all FO < -scntenccs of quantifier rank n 
such that Ii contains a cycle and I 2 does not. We may even assume that Pi,P 2 ,P 3 are non-empty in 
both instances. 



Let Ji and 3i be the canonical universal solutions of I\ and I2 with respect to S st . Then J 2 must 
contain at least three nulls, one belonging only to Q\, one belonging only to Q2 and one belonging only 
to Q3. To see this, note that, first of all, J2 must be a homomorphic pre-image of the core universal 
solution of I2 with respect to A4. Secondly, if one of the relations Qi a non-empty in J 2, then the 
crucial information that -^(P) is non-empty is lost, in the sense that J 2 would be a homomorphic 
pre-image of the source instance that is like I2 except that the relation P, is empty, which impies that, 
the result of chasing J2 with E t must be homomorphically contained in the core universal solution of 
this modified source instance with respect to A4, which is different from the core universal solution of 
h)- 

This shows that J2 must contain at least three nulls, one belonging only to Q\, one belonging only 
to Q2 and one belonging only to Q3 . Each of these nulls must have been created by the application of 
a dependency from S st . Since I\ and I2 agree on all FO < -scntcnccs of quantifier rank n, the left-hand- 
side of this dependency is also satisfied in I\ , and hence the same null is also created in the canonical 
universal solution of I\ . 

Claim 2: Let J' 2 be result of chasing J2 with S t (assuming it exists). Then cannot be the core 
universal solution of I\ with respect to A4 . 

The proof of Claim 2 is based on a monotonicity argument. More precisely, we use the fact that the 
left-hand-side of each target depedency is a conjunctive query, and hence is preserved under homo- 
morphisms. Let us assume for the sake of contradiction that J[ is the core universal solution of I\ 
with respect to A4, which contains exactly two null values, one in Qi H Q3 and one in Q2 H Q3. Let 
Ni,N2, N3 be null values belonging only to J\(P\), only to J\(Q2) and only to Ji(Qs), respectively. It 
is easy to see that, during the chase with S t , N3 must have been identified with N± or N2 by means 
of a target egd (j>. A monotonicity argument shows that the same target egd 4> can be used to identify 
the two null values in the core universal solution J( (note that the target dependencies cannot refer to 
the linear order on the constants). This contradicts the fact that J[ is the end-result of the chase with 
S t . □ 

We expect that similar arguments can be used to find a schema mapping M. specified by a finite set 
of LAV s-t tgds and target cgds, such that there is no CQ-equivalent laconic schema mapping specified 
by a finite set of FO < s-t tgds, target tgds and target egds. 
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